Use log-scaled quantile sketch budgets and rank-based accuracy checks#12129
Use log-scaled quantile sketch budgets and rank-based accuracy checks#12129RAMitchell wants to merge 10 commits intodmlc:masterfrom
Conversation
…uantile-logn-budget # Conflicts: # tests/cpp/common/test_hist_util.cu
…uantile-logn-budget # Conflicts: # tests/cpp/common/test_hist_util.cc # tests/cpp/common/test_hist_util.cu # tests/cpp/common/test_hist_util.h
There was a problem hiding this comment.
Pull request overview
This PR updates quantile sketch budgeting to follow the same O(log n / eps) summary-size behavior as the single-machine sketch (including distributed CPU merge/prune), and refreshes test coverage to validate the rank-error contract instead of comparing cut values directly.
Changes:
- Track per-feature represented element counts in
WQuantileSketch, serialize them in the distributed CPU sketch allreduce payload, and recompute merge/prune budgets from those counts. - Route multiple CPU/GPU sketch sizing paths through shared budget helpers (
SketchSummaryBudget), including the GPU intermediate prune target. - Replace/extend C++ and Python tests to use rank-based cut validation (plus exact-cut coverage when the budget can retain all unique values).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/common/quantile.h |
Adds element-count tracking to WQuantileSketch and an exact-summary fast path for sorted weighted input. |
src/common/quantile.cc |
Extends distributed sketch payload to include element counts and uses SketchSummaryBudget during merge/prune and sorted ingestion. |
src/common/quantile.cu |
Uses SketchSummaryBudget for GPU intermediate pruning instead of a local helper. |
src/common/quantile.cuh |
Removes IntermediateNumCuts() helper (now replaced by shared budget helper usage). |
src/common/hist_util.cu |
Switches sample-cut sizing to SketchSummaryBudget. |
tests/cpp/common/test_hist_util.h |
Tightens/aligns rank-error thresholds, updates exact-value validation, and adds a weight-aware validation wrapper. |
tests/cpp/common/test_hist_util.cu |
Uses the new weight-aware validation wrapper for GPU sketch tests. |
tests/cpp/common/test_hist_util.cc |
Adjusts rank-error validation for weighted CPU cases and adds a sorted weighted exact-cut regression test. |
tests/cpp/common/test_quantile.cc |
Reworks distributed CPU quantile tests to validate rank error (row/column split + sparse count skew). |
tests/cpp/common/test_quantile.cu |
Aligns distributed GPU weighted tolerance usage with the shared weighted threshold. |
python-package/xgboost/testing/quantile_dmatrix.py |
Adds shared Python rank-error validation helpers and uses them in reference-cut checks. |
python-package/xgboost/testing/updater.py |
Adds rank-error assertions for get_quantile_cut device tests (numerical case). |
tests/python/test_data_iterator.py |
Replaces local rank-error helper with shared Python helper. |
tests/python/test_quantile_dmatrix.py |
Adds rank-error assertions for iterator-vs-array quantile cuts in training test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MAX_NORMALIZED_RANK_ERROR = 2.0 | ||
| MAX_WEIGHTED_NORMALIZED_RANK_ERROR = 14.0 |
There was a problem hiding this comment.
Could you please provide some brief comments on utilities here?
There was a problem hiding this comment.
I will do a bit more rewriting - also the weighted rank error of 14 is way larger than it should be really.
Summary
This PR aligns quantile sketch sizing more closely with the single-machine algorithm and updates the test suite to validate rank-error guarantees instead of cut-value deltas.
The main functional change is on the CPU distributed sketch path: we now track the number of represented elements per feature, serialize those counts through the distributed sketch payload, and recompute
SketchSummaryBudget(...)after merge/prune using the summed per-feature counts. This changes the distributed CPU merge budget from a fixedO(1 / eps)cap to the sameO(log n / eps)budget shape used by the underlying sketch.In addition, this PR cleans up related sizing paths and strengthens quantile accuracy coverage across C++ and Python.
What This Changes
WQuantileSketchSketchSummaryBudget(...)Test Changes
QuantileDMatrix/ quantile-cut testsTesting
Ran locally:
./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'./build-cuda-local/testxgboost --gtest_filter='HistUtil.*:GPUQuantile.*'pytest tests/python/test_data_iterator.py tests/python/test_quantile_dmatrix.py tests/python/test_updaters.py -k "test_data_iterator or test_training or test_ref_quantile_cut or test_get_quantile_cut"Notes
This PR is no longer limited to CPU distributed merge/prune only. It now includes:
log n / epsbudget plumbing